CreditCard Users Churn Prediction

Author: Aidos Utegulov
Cohort: Feb 21

Problem Statement:

The Thera bank recently saw a steep decline in the number of users of their credit card, credit cards are a good source of income for banks because of different kinds of fees charged by the banks like annual fees, balance transfer fees, and cash advance fees, late payment fees, foreign transaction fees, and others. Some fees are charged to every user irrespective of usage, while others are charged under specified circumstances.

Customers’ leaving credit cards services would lead bank to loss, so the bank wants to analyze the data of customers and identify the customers who will leave their credit card services and reason for same – so that bank could improve upon those areas

You as a Data scientist at Thera bank need to come up with a classification model that will help the bank improve their services so that customers do not renounce their credit cards

Objective

Explore and visualize the dataset. Build a classification model to predict if the customer is going to churn or not Optimize the model using appropriate techniques Generate a set of insights and recommendations that will help the bank

Data Dictionary:

Import necessary libraries

Load and view the dataset

Observations

Let's check the number of unique values in each column

Summary of the data

Let's check the count of each unique category in each of the categorical variables.

Observations

EDA

Univariate

Observations on Total transaction amount and Total transaction count columns

Percentage values of categorical columns

Observations

Data Preprocessing

Bivariate Analysis

Observations

Observations

Feature Engineering

Split the data into train and test sets

Outlier treatment

Encoding categorical varaibles

Building the model

Logistic Regression

Let's evaluate the model performance by using KFold and cross_val_score

Observations

Oversampling train data using SMOTE

Logistic Regression on oversampled data

Regularization

Observations

Undersampling train data using SMOTE

Logistic Regression on undersampled data

Observations

Building a Pipeline

Observations

Hyperparameter Tuning for 3 best models

AdaBoost

GridSearchCV

RandomizedSearchCV

Gradient Boosting

GridSearchCV

RandomizedSearchCV

XGBoost

GridSearchCV

RandomizedSearchCV

Comparing all models

Conclusions and Recommendations

Business Insights